Amazon Mechanical Turk for Subjectivity Word Sense Disambiguation

نویسندگان

  • Cem Akkaya
  • Alexander Conrad
  • Janyce Wiebe
  • Rada Mihalcea
چکیده

Amazon Mechanical Turk (MTurk) is a marketplace for so-called “human intelligence tasks” (HITs), or tasks that are easy for humans but currently difficult for automated processes. Providers upload tasks to MTurk which workers then complete. Natural language annotation is one such human intelligence task. In this paper, we investigate using MTurk to collect annotations for Subjectivity Word Sense Disambiguation (SWSD), a coarse-grained word sense disambiguation task. We investigate whether we can use MTurk to acquire good annotations with respect to gold-standard data, whether we can filter out low-quality workers (spammers), and whether there is a learning effect associated with repeatedly completing the same kind of task. While our results with respect to spammers are inconclusive, we are able to obtain high-quality annotations for the SWSD task. These results suggest a greater role for MTurk with respect to constructing a large scale SWSD system in the future, promising substantial improvement in subjectivity and sentiment analysis.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unsupervised Disambiguation of Image Captions

Given a set of images with related captions, our goal is to show how visual features can improve the accuracy of unsupervised word sense disambiguation when the textual context is very small, as this sort of data is common in news and social media. We extend previous work in unsupervised text-only disambiguation with methods that integrate text and images. We construct a corpus by using Amazon ...

متن کامل

Clustering dictionary definitions using Amazon Mechanical Turk

Vocabulary tutors need word sense disambiguation (WSD) in order to provide exercises and assessments that match the sense of words being taught. Using expert annotators to build a WSD training set for all the words supported would be too expensive. Crowdsourcing that task seems to be a good solution. However, a first required step is to define what the possible sense labels to assign to word oc...

متن کامل

Ideological Perspective Detection Using Semantic Features

In this paper, we propose the use of word sense disambiguation and latent semantic features to automatically identify a person’s perspective from his/her written text. We run an Amazon Mechanical Turk experiment where we ask Turkers to answer a set of constrained and open-ended political questions drawn from the American National Election Studies (ANES). We then extract the proposed features fr...

متن کامل

Cheap and Fast - But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks

Human linguistic annotation is crucial for many natural language processing tasks but can be expensive and time-consuming. We explore the use of Amazon’s Mechanical Turk system, a significantly cheaper and faster method for collecting annotations from a broad base of paid non-expert contributors over the Web. We investigate five tasks: affect recognition, word similarity, recognizing textual en...

متن کامل

Anveshan: A Framework for Analysis of Multiple Annotators' Labeling Behavior

Manual annotation of natural language to capture linguistic information is essential for NLP tasks involving supervised machine learning of semantic knowledge. Judgements of meaning can be more or less subjective, in which case instead of a single correct label, the labels assigned might vary among annotators based on the annotators’ knowledge, age, gender, intuitions, background, and so on. We...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010